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AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the application: 

Listing of Claims : 

1 . (Currently amended) A method of determining cluster attractors for a plurality of documents, 
each document comprising at least one term, each term comprising one or more words, the 
method comprising: calculating, in respect of each term, a probability distribution indicative of 
the frequency of occurrence of [[the]] one other term in the instance where a document comprises 
said term and said one other term , [[or]] and in the instance where a document comprises said 
term and more than one other term, the respective frequency of occurrence of each [[,]] other 
tern^ that co-occurs with said term in at least one of said documents; calculating, in respect of 
each term, the entropy of the respective probability distribution; selecting at least one of said 
probability distributions as a cluster attractor depending on the respective entropy value. 

2. (Original) A method as claimed in Claim 1 , wherein each probability distribution comprises, 
in respect of each co-occurring term, an indicator that is indicative of the total number of 
instances of the respective co-occurring term in all of the documents in which the respective co- 
occurring term co-occurs with the term in respect of which the probability distribution is 
calculated. 

3. (Previously presented) A method as claimed in Claim 1, wherein each probability distribution 
comprises, in respect of each co-occurring term, an indicator comprising a conditional probability 
of the occurrence of the respective co-occurring term in a document given the appearance in said 
document of the term in respect of which the probability distribution is calculated. 

4. (Previously presented) A method as claimed in Claim 2, wherein each indicator is normalized 
with respect to the total number of terms in the document , or each [[,]] document in which the 
term in respect of which the probability distribution is calculated appears. 
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5. (Original) A method as claimed in Claim 1, comprising assigning each term to one of a 
plurality of subsets of terms depending on the frequency of occurrence of the term; and selecting, 
as a cluster attractor, the respective probability distribution of one or more terms from each 
subset of terms. 

6. (Original) A method as claimed in Claim 5, wherein each term is assigned to a subset 
depending on the number documents of the corpus in which the respective term appears. 

7. (Previously presented) A method as claimed in Claim 5, wherein an entropy threshold is 
assigned to each subset, the method comprising selecting, as a cluster attractor, the respective 
probability distribution of one or more terms from each subset having an entropy that satisfies the 
respective entropy threshold. 

8. (Original) A method as claimed in Claim 7, comprising selecting, as a cluster attractor, the 
respective probability distribution of one or more terms from each subset having an entropy that 
is less than or equal to the respective entropy threshold. 

9. (Previously presented) A method as claimed in Claim 5, wherein each subset is associated with 
a frequency range and wherein the frequency ranges for respective subsets are disjoint. 

10. (Previously presented) A method as claimed in Claim 5, wherein each subset is associated 
with a frequency range, the size of each successive frequency range being equal to a constant 
multiplied by the size of the preceding frequency range in order of increasing frequency. 

11. (Previously presented) A method as claimed in Claim 7, wherein the respective entropy 
threshold increases for successive subsets in order of increasing frequency. 
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12. (Original) A method as claimed in Claim 1 1, wherein the respective entropy threshold for 
successive subsets increases linearly. 

13. (Cancelled) 

14. (Currently amended) An apparatus for determining cluster attractors for a plurality of 
documents, each document comprising at least one term, each term comprising one or more 
words, the apparatus comprising: means for calculating, in respect of each term, a probability 
distribution indicative of the frequency of occurrence of [[the]] one other term in the instance 
where a document comp rises said term and said one other term , [[or]] and in the instance where a 
document comprises s aid term and more than one other term, the respective frequency of 
occurrence of each [[,]] other term^ the entropy of the respective probability distribution; and 
means for selecting at least one of said probability distributions as a cluster attractor depending 
on the respective entropy value. 

15. (Previously presented) A method of clustering a plurality of documents, each document 
comprising at least one term, each term comprising one or more words, the method comprising 
determining cluster attractors in accordance with the method of Claim 1 ; comparing each 
document with each cluster attractor; and assigning each document to one or more cluster 
attractors depending on the similarity between the document and the cluster attractors. 

16. (Original) A method as claimed in Claim 15, comprising: calculating, in respect of each 
document, a probability distribution indicative of the frequency of occurrence of each term in the 
document; comparing the respective probability distribution of each document with each 
probability distribution selected as a cluster attractor; and assigning each document to at least one 
cluster depending on the similarity between the compared probability distributions. 

17. (Previously presented) A method as claimed in Claim 16, comprising organizing the 
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documents within each cluster by: assigning a respective weight to each document, the value of 
the weight depending on the similarity between the probability distribution of the document and 
the probability distribution of the cluster attractor; comparing the respective probability 
distribution of each document in the cluster with the probability distribution of each other 
document in the cluster; assigning a respective weight to each pair of compared documents, the 
value of the weight depending on the similarity between the compared respective probability 
distributions of each document of the pair; calculating a minimum spanning tree for the cluster 
based on the respective calculated weights. 

1 8. (Currently amended) A computer-implemented method of clustering a plurality of 
documents, each document comprising at least one term, each term comprising one or more 
words, the method including: causing a computer to calculate, in respect of each term, a 
probability distribution indicative of the frequency of occurrence of [[the]] one other term in the 
instance where a docu ment comprises said term and said one other term , [[or]] and in the 
instance where a docu ment comprises said term and more than one other term, the respective 
frequency of occurrence of each [[,]] other tern^ that co-occurs with said term in at least one of 
said documents; causing the computer to calculate, in respect of each term, the entropy of the 
respective probability distribution; causing the computer to select at least one of said probability 
distributions as a cluster attractor depending on the respective entropy value. 
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